Learning from human errors: prediction of phoneme confusions based on modified ASR training
نویسندگان
چکیده
In an attempt to improve models of human perception, the recognition of phonemes in nonsense utterances was predicted with automatic speech recognition (ASR) in order to analyze its applicability for modeling human speech recognition (HSR) in noise. In the first experiments, several feature types are used as input for an ASR system; the resulting phoneme scores are compared to listening experiments using the same speech data. With conventional training, the highest correlation between predicted and measured recognition was observed for perceptual linear prediction features (r = 0.84). Secondly, a new training paradigm for ASR is proposed with the aim of improving the prediction of phoneme intelligibility. For this perceptual training, the original utterance labels are modified based on the confusions measured in HSR tests. The modified ASR training improved the overall prediction, with the best models (r = 0.89) exceeding those obtained with conventional training (r = 0.80).
منابع مشابه
DNN-Based Automatic Speech Recognition as a Model for Human Phoneme Perception
In this paper, we test the applicability of state-of-the-art automatic speech recognition (ASR) to predict phoneme confusions in human listeners. Phoneme-specific response rates are obtained from ASR based on deep neural networks (DNNs) and from listening tests with six normal-hearing subjects. The measure for model quality is the correlation of phoneme recognition accuracies obtained in ASR an...
متن کاملAdapting machine translation models toward misrecognized speech with text-to-speech pronunciation rules and acoustic confusability
In the spoken language translation pipeline, machine translation systems that are trained solely on written bitexts are often unable to recover from speech recognition errors due to the mismatch in training data. We propose a novel technique to simulate the errors generated by an ASR system, using the ASR system’s pronunciation dictionary and language model. Lexical entries in the pronunciation...
متن کاملSession 2pSCa: Speech Communication 2pSCa2. Improving automatic speech recognition by learning from human errors
This work presents a series of experiments that compare the performance of human speech recognition (HSR) and automatic speech recognition (ASR). The goal of this line of research is to learn from the differences between HSR and ASR, and to use this knowledge to incorporate new signal processing strategies from the human auditory system in automatic classifiers. A database with noisy nonsense u...
متن کاملDevelopment of a Computer-Aided Language Learning System for Mandarin – Tone Recognition and Pronunciation Error Detection
This paper reports on the continued activities towards the development of a computer-aided language learning system for teaching Mandarin to Germans. A method for f0 normalization based on maximum likelihood estimation and tone recognition was implemented. Furthermore, a method for detecting the pronunciation errors was tested by calculating the confidence distance between the first and second ...
متن کاملEstimating Confusions in the ASR Channel for Improved Topic-based Language Model Adaptation
Human language is a combination of elemental languages/domains/styles that change across and sometimes within discourses. Language models, which play a crucial role in speech recognizers and machine translation systems, are particularly sensitive to such changes, unless some form of adaptation takes place. One approach to speech language model adaptation is self-training, in which a language mo...
متن کامل